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Abstract 

Benford's law is frequently used to evaluate the likihood that data is 
misrepresentative. Typically statistical tests measure the likihood. An- 
other method of employing Benford's law is to compare the frequency of 
leading digits to the probabilities of leading digits over a subset of the 
natural numbers. This paper proposes using the probabilities of lead- 
ing digits from uniform, natural numbers to establish interval criteria for 
when to look more closely into the possibility of misrepresentative data. 
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Benford's law gives a probability distribution for the frequency of the leading- 
digit of natural numbers. Simon Newcomb described the rule for decimal repre- 
sentation of natural numbers in 1881 [S], and Frank Benford generalized New- 
comb's observations to any base in 1938 [1]. In 1995, Theodore Hill used the 
mantissa cr-algebra to further extend the leading-digit law to real numbers. The 
mantissa cr-algebra consists of sets of numbers with the same coefficient in sci- 
entific notation after truncation [2]. 

Definition 1.1 (Benford's Law). In base b, the probability that the leading digit 
of a real number is k is given by 

P{k) = logi,{l + i), fc e {1, 2, 3, . . . , 5 - 1}. (1.1) 

In decimal representation (base 10), the probabilities of each the leading 
digits are given by 

P(fc) = Zogio(l + i), fee {1,2,3,4,5,6,7,8,9}, (1.2) 



which approximately gives: 
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P{k) 


0.301 


0.176 


0.125 


0.097 


0.079 


0.067 


0.058 


0.051 


0.046 



The law goes further to say that the probability distribution of digits after the 
leading digit converges to uniform as the digit's position moves to the right 
[U . Benford's law does not apply to several types of numeric data, such as 
identification numbers. 



2 Benford-Newcomb Subsequences 

Consider the map that sends natural numbers to their leading digits, 

A : N ^ {1, 2, 3, . . . , 6 - 1}, a: ^ floor ( ). (2.1) 

Let /i^r be the uniform probability measure on N where /iAr(fc) — jf V k £ 
{1, 2, 3, . . . , iV}. Let's use fi^ to construct a probability measure of leading 
digits, 

PbNik) = ^lNi{x € N\Mx) = k}). (2.2) 

For a fixed base b and fixed leading digit fc, consider the sequences {PbN{k))^^i; 
in general these sequences do not converge. The purpose of this paper is to 
propose using intervals of the form 

[liminf PbAf(^), hmsupPbAr(fc)] (2.3) 

Af-i-oo Af-i-oo 

to identify possibly fraudulent data. If a data set's frequency of leading digits, 
in base b representation, is not contained in these intervals, then look further 
into the possibility of tamper data. For N > b, with respect to N the local 
minimums are of the form 

PbNik) ^ +1-+"°" , N = kb"-1 (2.4) 

and the local maximums are of the form 

PbNik) ^ '\i%i^L\'" , N={k + 1)6" - 1. (2.5) 

Thus if the frequencies of a data set's leading digits are not within 

\-k{b-i) ' (fe+iKb-i)]' (2-^) 

further inquiry is called for. The advantage of the interval method is that one 
may use it to quickly screen data. 
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Benford's Law and Intervals for Base 10 



I liminf limsup interval 
1 Benford's Law 
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leading digits 

The figures were constructed with R 



Base 10 CDFs 




199 299 399 499 599 



N 

The lines show how the cdfs change with N. 
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